Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(agents-api): Fix for temporal running out of history size #687

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

creatorrr
Copy link
Contributor

@creatorrr creatorrr commented Oct 17, 2024

Closes #678
Closes #679

Signed-off-by: Diwank Singh Tomer [email protected]


Important

This PR introduces remote data handling in the agents API to prevent Temporal from running out of history size by storing large inputs and outputs in a blob store.

  • Activities:
    • Add save_inputs_remote_fn and load_inputs_remote_fn in sync_items_remote.py to handle large inputs by storing them in a blob store.
    • Define save_inputs_remote and load_inputs_remote as activities.
  • Remote Handling:
    • Introduce RemoteObject, BaseRemoteModel, and RemoteList in remote.py to manage remote data storage and retrieval.
    • Modify StepContext in tasks.py to use RemoteObject for execution_input and inputs.
  • Storage:
    • Implement store_in_blob_store_if_large and load_from_blob_store_if_remote in storage_handler.py to manage large data.
    • Add auto_blob_store_workflow decorator for workflows to handle remote data.
  • Workflows:
    • Update TaskExecutionWorkflow in task_execution/__init__.py to use remote activities for input/output handling.
    • Modify helper functions in task_execution/helpers.py to support remote data handling.
  • Configuration:
    • Change blob_store_cutoff_kb in env.py and docker-compose.yml to 64KB and 128KB respectively for better data management.

This description was created by Ellipsis for f7879d3. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to f7879d3 in 46 seconds

More details
  • Looked at 834 lines of code in 14 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. agents-api/agents_api/common/protocol/remote.py:29
  • Draft comment:
    The use of activity.in_activity() in BaseRemoteModel and RemoteList methods may not be appropriate. Consider handling this logic outside of these classes to keep them agnostic of the execution context.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment is suggesting a design change, which is a valid point if the use of activity.in_activity() is indeed coupling these classes to a specific execution context. This could be a code quality issue worth addressing.
    The comment is speculative and suggests a design change without strong evidence that the current design is problematic. It might not be necessary to change the design if the current implementation is intentional and works as expected.
    While the comment is speculative, it raises a valid point about potential design improvement. However, without evidence of an issue, it might not be necessary to act on it immediately.
    The comment is speculative and suggests a design change without strong evidence of an issue. It should be removed unless there is a clear problem with the current implementation.
2. agents-api/agents_api/common/exceptions/tasks.py:125
  • Draft comment:
    The logic for HTTP status codes 408, 429, 503, and 504 is inverted. These should typically be considered retryable errors, but the current implementation marks them as non-retryable.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. agents-api/agents_api/common/storage_handler.py:21
  • Draft comment:
    Using sys.getsizeof may not accurately reflect the true size of serialized data. Consider using a more reliable method to determine the size of data for blob storage decisions.
  • Reason this comment was not posted:
    Comment was on unchanged code.
4. agents-api/agents_api/activities/sync_items_remote.py:13
  • Draft comment:
    Consider adding exception handling in the list comprehensions to manage potential errors during input processing.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The load_inputs_remote_fn and save_inputs_remote_fn functions in sync_items_remote.py use list comprehensions to process inputs. While this is efficient, it does not handle exceptions that may occur during processing, which could lead to unhandled errors.

Workflow ID: wflow_O56wI5InBTktbH7k


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@HamadaSalhab HamadaSalhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@creatorrr creatorrr merged commit c77a1a4 into dev Oct 17, 2024
10 of 14 checks passed
@creatorrr creatorrr deleted the x/temporal-running-out-of-history-size branch October 17, 2024 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: worker fails to start if seaweedfs not ready [Bug]: Add msgpack dependency
2 participants